Improved Distributed Principal Component Analysis
نویسندگان
چکیده
We study the distributed computing setting in which there are multiple servers,each holding a set of points, who wish to compute functions on the union of theirpoint sets. A key task in this setting is Principal Component Analysis (PCA), inwhich the servers would like to compute a low dimensional subspace capturing asmuch of the variance of the union of their point sets as possible. Given a proce-dure for approximate PCA, one can use it to approximately solve problems suchas k-means clustering and low rank approximation. The essential properties of anapproximate distributed PCA algorithm are its communication cost and computa-tional efficiency for a given desired accuracy in downstream applications. We givenew algorithms and analyses for distributed PCA which lead to improved com-munication and computational costs for k-means clustering and related problems.Our empirical study on real world data shows a speedup of orders of magnitude,preserving communication with only a negligible degradation in solution quality.Some of these techniques we develop, such as a general transformation from aconstant success probability subspace embedding to a high success probabilitysubspace embedding with a dimension and sparsity independent of the successprobability, may be of independent interest.
منابع مشابه
Outlier Detection in Wireless Sensor Networks Using Distributed Principal Component Analysis
Detecting anomalies is an important challenge for intrusion detection and fault diagnosis in wireless sensor networks (WSNs). To address the problem of outlier detection in wireless sensor networks, in this paper we present a PCA-based centralized approach and a DPCA-based distributed energy-efficient approach for detecting outliers in sensed data in a WSN. The outliers in sensed data can be ca...
متن کاملPrincipal component analysis or factor analysis different wording or methodological fault?
This article has no abstract.
متن کاملDevelopment of a cell formation heuristic by considering realistic data using principal component analysis and Taguchi’s method
Over the last four decades of research, numerous cell formation algorithms have been developed and tested, still this research remains of interest to this day. Appropriate manufacturing cells formation is the first step in designing a cellular manufacturing system. In cellular manufacturing, consideration to manufacturing flexibility and productionrelated data is vital for cell formation....
متن کاملFeature reduction of hyperspectral images: Discriminant analysis and the first principal component
When the number of training samples is limited, feature reduction plays an important role in classification of hyperspectral images. In this paper, we propose a supervised feature extraction method based on discriminant analysis (DA) which uses the first principal component (PC1) to weight the scatter matrices. The proposed method, called DA-PC1, copes with the small sample size problem and has...
متن کاملAn Empirical Comparison between Grade of Membership and Principal Component Analysis
t is the purpose of this paper to contribute to the discussion initiated byWachter about the parallelism between principal component (PC) and atypological grade of membership (GoM) analysis. The author testedempirically the close relationship between both analysis in a lowdimensional framework comprising up to nine dichotomous variables and twotypologies. Our contribution to the subject is also...
متن کاملSparse Structured Principal Component Analysis and Model Learning for Classification and Quality Detection of Rice Grains
In scientific and commercial fields associated with modern agriculture, the categorization of different rice types and determination of its quality is very important. Various image processing algorithms are applied in recent years to detect different agricultural products. The problem of rice classification and quality detection in this paper is presented based on model learning concepts includ...
متن کامل